Boolean and ranked information retrieval for biomedical systematic reviewing
نویسنده
چکیده
Evidence-based medicine seeks to base clinical decisions on the best currently available scientific evidence and is becoming accepted practice. A key role is played by systematic reviews, which synthesize the biomedical literature and rely on different information retrieval methods to identify a comprehensive set of relevant studies. With Boolean retrieval, the primary retrieval method in this application domain, relevant documents are often excluded from consideration. Ranked retrieval methods are able to mitigate this problem, but current approaches are either not applicable, or they do not perform as well as the Boolean method. In this thesis, a ranked retrieval model is identified that is applicable to systematic review search and also effective. The p-norm approach to extended Boolean retrieval, which generalizes the Boolean model but, to some extent, also introduces ranking, is found to have a particularly promising prospect: identifying a greater fraction of relevant studies when typical numbers of documents are reviewed, but also possessing properties important during the query formulation phase and for the overall retrieval process. Moreover, efficient methods available for ranked keyword retrieval models are adapted to extended Boolean models. The query processing methods presented in this thesis result in significant speed ups of a factor of 2 to 9, making this retrieval model an attractive choice in practice. Finally, in support of the retrieval process during the subsequent update of systematic reviews, a query optimization method is devised that makes use of the knowledge about the properties of relevant and irrelevant studies to boost the effectiveness of the search process.
منابع مشابه
Boolean versus ranked querying for biomedical systematic reviews
BACKGROUND The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Bo...
متن کاملFacilitating Biomedical Systematic Reviews Using Ranked Text Retrieval and Classification
Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive and difficult to maintain. We demonstrate a two-stage searching system that takes advantage of ranked queries an...
متن کاملExtended Boolean retrieval for systematic biomedical reviews
Searching for relevant documents is a laborious task involved in preparing systematic reviews of biomedical literature. Currently, complex Boolean queries are iteratively developed, and then each document of the final query result is assessed for relevance. However, the result set sizes of these queries are hard to control, and in practice it is difficult to balance the competing desires to kee...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کامل